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We propose an empirical likelihood test that is able to test the goodness of fit of a class of para- 
metric and semi-parametric multiresponse regression models. The class includes as special cases 
fully parametric models; semi-parametric models, like the multiindex and the partially linear 
models; and models with shape constraints. Another feature of the test is that it allows both the 
response variable and the covariate be multivariate, which means that multiple regression curves 
can be tested simultaneously. The test also allows the presence of infinite-dimensional nuisance 
functions in the model to be tested. It is shown that the empirical likelihood test statistic is 
asymptotically normally distributed under certain mild conditions and permits a wild bootstrap 
calibration. Despite the large size of the class of models to be considered, the empirical likeli- 
hood test enjoys good power properties against departures from a hypothesized model within 
the class. 
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1. Introduction 

Suppose {{Xi,Yi)}f_^-^ is an independent and identically distributed random vector, where 
Yi is a fc-variate response and Xi a d-variate covariate. Let m(x) = E{Yi\Xi = x) = 
{mi{x), . . . , mk{x)) be the conditional mean consisting of k regression curves on R"^ and 
= Ya,i{Yi\Xi = x) he a k x k matrix whose values change along with the covariate. 
Let m(-) = m{-,6,g) = (mi(-, 0, g), . . . , mfc(-, 0, g)) be a working regression model of 
which one would like to check its validity. The form of m is known up to a finite- 
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dimensional parameter 9 and an infinite-dimensional nuisance parameter g. The model 
m(-,9,g) includes a wide range of parametric and semi-parametric regression models as 
special cases. In the absence of g, the model degenerates to a fully parametric model 
m(-) = m{-,6), whereas the presence of g covers a range of semi-parametric models in- 
cluding the single or multiindex models and partially linear single-index models. The 
class also includes models with qualitative constraints, like additive models and mod- 
els with shape constraints. The variable selection problem, the comparison of regression 
curves and models for the variance function can be covered by the class of m(-,9,g) as 
well. 

Multiresponse regression is frequently encountered in applications. In compartment 
analysis arising in biological and medical studies as well as chemical kinetics (Atkinson 
and Bogacka (2002)), a multivariate variable is described by a system of differential equa- 
tions whose solutions satisfy multiresponse regression (Jacquez (1996)). Surface designs 
and multivariate random vectors are collected as responses of some controlled variables 
(covariates) of certain statistical experiments. Khuri (2001) proposed using the general- 
ized linear models for modeling such data and Uciiiski and Bogacka (2005) studied the 
issue of optimal designs with an objective for discrimination between two multiresponse 
system models. The monographs by Bates and Watts ((1988), Chapter 4) and Seber and 
Wild ((1989), Chapter 11) contain more examples of multiresponse regression as well as 
their parametric inference. 

The need for testing multiple curves occurs even in the context of univariate responses 
Yi. Consider the following heteroscedastic regression model: 

r, = r(X,) + a(X,)e„ 

where the e^'s are unit residuals such that E{ei\Xi) = and E{ef\Xi) = 1, and r(-) and 
(T^(-) are, respectively, the conditional mean and variance functions. Suppose r{x,0,g) 
and (J^{x,9,g) are certain working parametric or semi-parametric models. In this case, the 
bivariate response vector is {Yi^Y^)^ and the bivariate model specification m{x,9,g) = 
[r{x,9,g),<j^{x,9,g) + r\x,9,g)f. 

The aim of the paper is to develop a nonparametric goodncss-of-fit test for the hy- 
pothesis 

iIo:m(-)=m(-,0,5), (1.1) 

for some known fc-variate function m{-,9,g), some finite-dimensional parameter 9 (zQ C 
BP (p > 1) and some function g € G that is a complete metric space consisting of functions 
from to R'' {q> 1). We will use two pieces of nonparametric statistical hardware, 
the kernel regression estimation technique and the empirical likelihood technique, to 
formulate a test for Hq. 

In the case of a single regression curve (i.e., fc = 1), the nonparametric kernel ap- 
proach has been widely used to construct goodness-of-fit tests for the conditional mean 
or variance function. Eubank and Spiegelman (1990), Eubank and Hart (1992), Hardle 
and Mammen (1993), Hjcllvik and Tj0stheim (1995), Fan and Li (1996), Hart (1997) 
and Hjellvik, Yao and Tj0stheim (1998) have developed consistent tests for a parametric 
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specification by employing the kernel smoothing method based on a fixed bandwidth. 
Horowitz and Spokoiny (2001) propose a test based on a set of smoothing bandwidths 
in the construction of the kernel estimator. Its extensions are considered in Chen and 
Gao (2007) for time series regression models and in Rodriguez-Poo, Sperlich and Vieu 
(2009) for semi-parametric regression models. Other related references can be found in 
the books by Hart (1997) and Fan and Yao (2003). 

The empirical likelihood (EL) (Owen (1988, 1990)) is a technique that allows the 
construction of a nonparametric likelihood for a parameter of interest in a nonparametric 
or semi-parametric setting. Despite that it is intrinsically nonparametric, it possesses two 
important properties of a parametric likelihood: the Wilks' theorem and the Bartlett 
correction. Qin and Lawless (1994) establish EL for parameters defined by estimating 
equations, which is the widest framework for EL formulation. Zhang and Gijbels (2003) 
considered an EL procedure based on a sieve approach, whereas Chen and Cui (2006) 
show that the EL admits a Bartlett correction under this general framework. Hjort, 
McKeague and Van Keilegom (2009) consider the properties of the EL in the presence 
of both finite and infinite-dimensional nuisance parameters as well as when the data 
dimension is high. Sec Owen (2001) for a comprehensive overview of the EL method and 
references therein. 

Goodness-of-fit tests based on the EL have been proposed in the literature, which 
include Li (2003) and Li and Van Keilegom (2002) for survival data; Einmahl and McK- 
eague (2003) for testing some characteristics of a distribution function; and Chen, Hardle 
and Li (2003) for conditional mean functions with dependent data. Fan and Zhang (2004) 
propose a sieve EL test for testing a varying-coefficient regression model that extends 
the generalized likelihood ratio test of Fan, Zhang and Zhang (2001). They demonstrate 
that the 'Wilks phenomenon' continues to hold under general error distributions. Tri- 
pathi and Kitamura (2003) propose an EL test for conditional moment restrictions. The 
above two tests and the test we are to propose display an interesting diversity in test 
statistic formulations via the EL. The basic idea of the EL is to maximize an objective 
function that is a product of probability weights allocated to observations under certain 
constraints that characterize the functional object to be tested. Fan and Zhang (2004) 
apply kernel smoothing in both the objective function and the constraints, whereas Tri- 
pathi and Kitamura (2003) smooth only the objective function and we will smooth only 
the constraints. Fan and Zhang's and our test statistics are based on first constructing 
local statistics over a range of fixed points and then summing them up to form the final 
test statistic. The formulation in Tripathi and Kitamura (2003) consists of one step with 
a global objective function over the range of the entire sample. A common feature among 
the three formulations is that the test statistics are all asymptotically pivotal (Wilks 
phenomenon). This is due to the EL's ability to internally studentize a statistic via its 
optimization procedure. 

We consider in the present paper tests for a set of multiple semi-parametric regres- 
sion functions simultaneously. Multiple regression curves exist when the response Yi is 
genuinely multivariate, or when Yi is in fact univariate but we are interested in testing 
the validity of a set of feature curves; for example, the conditional mean and condi- 
tional variance at the same time. Empirical likelihood is a natural device to formulate 
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goodness-of-fit statistics to test multiple regression curves. This is due to EL's built-in 
feature to standardize a goodness-of-fit distance measure between a fully nonparametric 
estimate of the target functional curves and its hypothesized counterparts. This feature 
is well connected to the Wilks phenomenon in the sieve empirical likelihood of Fan and 
Zhang (2004) and the generalized likelihood ratio for additive models in Fan and Jiang 
(2005). The standardization carried out by the EL implicitly uses the true covariance 
matrix function, say V{x) of the kernel estimator m(-), to studentize the distance be- 
tween 77i(-) and the hypothesized model m{-,6,g), so that the goodness-of-fit statistic is 
an integrated Mahalanobis distance between the two sets of multivariate curves m{-) and 
m{-,9,g). This is attractive as we avoid estimating V{x), which can be a daunting task 
when k is larger than 1. When testing multiple regression curves, there is an intrinsic is- 
sue regarding how much each component-wise goodness-of-fit measure contributes to the 
final test statistic. The EL distributes the weights naturally according to V~^{x). And 
most attractively, this is done without requiring extra steps of estimation since it comes 
as a by-product of the internal algorithm. This attraction of the empirical likelihood has 
been discovered in Tripathi and Kitamura (2003), who propose an empirical likelihood 
test that can be used to test parametric multiple response regression models. 

The main contribution of the proposed test is its ability to test a large class of regres- 
sion models in the presence of both finite- and infinite-dimensional parameters. The class 
includes as special cases fully parametric models; semi-parametric models, like the multi- 
index and the partially linear models; and models with shape constraints, like monotone 
regression models. It is shown that the EL test statistic is asymptotically normally dis- 
tributed under certain mild conditions and permits a wild bootstrap calibration. Despite 
the large size of the class of models to be considered by the proposed test, the test enjoys 
good power properties against departures from a hypothesized model within the class. 

The paper is organized as follows. In the next section we introduce some notation and 
formulate the EL test statistic. Section 3 is concerned with the main asymptotic results, 
namely the asymptotic distribution of the test statistic both under the null hypothesis 
and under a local alternative, and the consistency of the bootstrap approximation. In 
Section 4 we focus on a number of particular models and apply the general results of 
Section 3 to these models. Simulation results are reported in Section 5. We conclude the 
paper by giving in Section 6 the assumptions and the proofs of the main results. 

2. The test statistic 

Let Yi = {Yii,...,Yik)'^ and m{x) = {mi{x), . . . ,mk{x))'^ , where mi{x) = E{Yu\Xi ^ x) 
is the Ith regression curve for ^ = 1, . . . , fc. Let = — m{Xi) be the ith residual vector. 
Define crijix) = Coy {en, eij\Xi —x), which is the conditional covariance between the /th 
and jth component of the residual vector. Then, the conditional covariance matrix S(a;) = 

Xi,v{Y,\X, = x) = {cjij{x))k>,k- 

Let K he a. d-dimensional kernel with a compact support on [—1, 1]''. Without loss of 
generality, K is assumed to be a product kernel based on a univariate kernel fc, that is. 
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K{ti, . . .,td) = Y[i=i where fc is a rth-order kernel supported on [—1, 1] and 

fc(u)du = l, J u''k{u)du = for Z = 1, . . . , r — 1 and J u^k{u)du — kr ^0 



for an integer r > 2. Define Kh{u) = h '^K{u/h). The Nadaraya- Watson (NW) kernel 
estimator of mi{x), I — 1^ ... ,k, is 



mix) 



ii 



where hi is the smoothing bandwidth for curve Different bandwidths are allowed to 
smooth different curves, which is sensible for multivariate responses. Then 

rh{x) — {rni{x), . . . ,rhk{xy)^ 

is the kernel estimator of the multiple regression curves. We assume throughout the paper 
that hi/h^ Pi as n — ^ oo, where h represents a baseline level of the smoothing bandwidth 
and Co < mini{/3;} < max;{/3i} < ci for finite and positive constants cq and ci free of n. 
Under the null hypothesis (1.1), 

=m(Xi,6'o,go) +ei, i = l,...,n, (2.1) 

where 9q is the true value of in 6, 50 is the true function in Q, the errors ei, . . . , e„ are 
independent and identically distributed, E(€i\Xi = x) = and Var(ei|Xj; = x) = E(a;). 

Let be a -^/ri-consistent estimator of 6q and ^ be a consistent estimator of under 
a norm || • \\g defined on the complete metric space Q. Any -yn-consistent estimator of 
6*0 would be fine; for instance, the pseudo-likelihood estimator that assumes that the 
residual distribution is normal. We suppose g is a kernel estimator based on a kernel L 
of order s > 2 and a bandwidth sequence 6, most likely different from the bandwidth h 
used to estimate m. We will require that g converges to 50 faster than {nh'^)~^^^, the 
optimal rate in a completely d-dimensional nonparametric model. As demonstrated in 
Section 4, this can be easily satisfied since g is of lower-dimensional than the saturated 
nonparametric model for m. 

Each mi{x,9,g) is smoothed by the same kernel K and bandwidth hi as in the kernel 
estimator rhi {x) , in order to prevent the bias of the kernel regression estimators entering 
the asymptotic distribution of the test statistic (see also Hardle and Mammen (1993)): 

-la--, J2'^=iKhi{x- Xi)mi{X„e,g) 

mi{x,9,g) = ' „ — 

22t=i KhAx - Xt) 

for I = 1, . . . ,k. Let mix, 9, g) = (mi(c£;, 9,g),. . . , mk{x, 9^gfp- . 

We note in passing that the dimension of the response Yi does not contribute to the 
curse of dimensionality. Rather, it is the dimension of the covariate Xi that contributes, 
since Xi is the direct target of smoothing. Hence, as far as the curse of dimensionality is 
concerned, testing multiple curves is the same as testing a single regression curve. 
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To formulate the empirical likelihood ratio test statistics, we first consider a fixed 
X G R"^ and then globalize by integrating the local likelihood ratio over a compact set 
S C R'^ in. the support of X. For each fixed x £ S, let 

Q,ix, 0) = [Kh, (x - X,)iY,i ~ rhiix, 0,g)), . . .,Kh,{x ~ Xi){Y,k - mk{x, 0,g))f , (2.2) 

which is a vector of local residuals at x and its mean is approximately zero. 

Let {pi{x)yi^i be non-negative real numbers representing empirical likelihood weights 
allocated to {(X;, Y^)}"^]^. The minus 2 log empirical likelihood ratio for the multiple 
conditional mean evaluated at m(x, 0, g) is 

n 

i{m{x,6/g)} = -2^1og{np,(x)} 

subject to Pi{x) > 0, J2'i=iPi{x) = 1 and J2'^=i Pi{x)Qi{Xi 0) = 0. By introducing a vector 
of Lagrange multipliers X{x) £ R^ , a standard empirical likelihood derivation (Owen 
(1990)) shows that the optimal weights are given by 

p,{x) = -{I + \'{x)Q,{x, e)}-\ (2.3) 
n 



where \{x) solves 



Q,ix,0) 



f^l + X'^{x)Q^{x,0) 
Integrating £{m(x,9,g)} against a weight function tt supported on S, gives 

A„(/i)= / £{m{x,9,g)}TT{x)dx, 



(2.4) 



which is our EL test statistic based on the bandwidth vector h = {hi, . . . , hk)"^ . 

Define Q{x,e) = n,-^ ^"^^ Qz{x,e), let f{x) be the density oiX, R{t) = J K{u)K{tu) Au 
and V{x) = f{x){Pj'^R{l3i/l3j)(Tij{x))kxk- Note that i?(l) = R{K) / K^{u)Au and 
that [3~'^R{[3i/ Pj) = /3;~''i?(/3j//3;) indicating that V{x) is a symmetric matrix. 

Derivations given in Section 6 show that 

K{h) = nh'' I Q^{x, eo)V~\x)Q{x, 0o)^(.t) dx + Op{h''/^), 



where /i'^/^ is the stochastic order of the first term on the right-hand side if c? < 4r. Here r 
is the order of the kernel K. Since Q{x, 9o) = f{x){7fi{x) — rh{x, 9o,g)}{l+Op{l)}, Q{x,9o) 
serves as a raw discrepancy measure between 771(2;) — (rhi (a;), . . . , rhk{x)) and the hypoth- 
esized model m{x,9Q, g). There is a key issue on how much each rhi{x) — ifii{x,9Q,g) 
contributes to the final statistic. The EL distributes the contributions according to 
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nh'''V~^{x), the inverse of the covariance matrix of Q{x,9[)), which is the most natu- 
ral choice. The nice thing about the EL formulation is that this is done without explicit 
estimation of ^(x) due to its internal standardization. Estimating V(x) when k is large 
can be challenging if not just tedious. 

3. Main results 

Let (7i,(x))fcxfc = {{p-''R{PilPj)aij{x))k^kr\ 

^h,h,n,i2(P^K)= J J j pi:fK{v)K{v)K{{p,,z + (3i,u)/Pi,)K{z + (3,,v/(3,,)AuAvAz, 

h,l2,jl,j2 

which is a bounded quantity under assumptions (A.l) and (A. 4) given in Section 6. 

Theorem 3.1. Under the assumptions (A.1)-(A.6) and (B.1)-(B.5) given in Section 6, 
and under Hq, 

as n — > oo . 

Remark 3.1 (Equal bandwidths). If hi ^ ■ ■ ■ ^ hk = h, that is, /3i = • • • = /3/j = 1, 
then uji,j^^j,^jJ/3,K) = K^'^\0), where K'-'^'^ is the convolution of if^^) ^nd /v'^) ^j^g 
convolution of K, that is, K^'^\u) ^ J K{v)K{u + v)dv. Since V{x) ^ f{x)R{K)Y,{x) 
in the case of equal bandwidths, 7'ji''''j2(^) = ^(ji = j2)R~^{K), where / is the 

indicator function. Therefore, a^{K,Y.) = 2kK'-'^^0)R~^{K) J 7r^(a;)da:, which is entirely 
known upon giving the kernel function K and the weight function tt. Hence, the EL test 
statistic is asymptotically pivotal. 

Remark 3.2 (Unequal bandwidths). If the bandwidths are not all the same, the 
asymptotic variance of A„(/i) may depend on which means that the EL test statistic 
is no longer asymptotically pivotal. However, the distribution of A„(/i) is always free of 
the design distribution of Xi . 

Let qna be the upper a-quantile of the distribution of ft,~'^/^{A„(/i) — fc} for a sig- 
nificance level a € (0,1). Theorem 3.1 assures that qna — >■ Za, the upper a quantile of 
N{0,a'^{K,i:)). However, the convergence can be slow. There is also an issue of estimat- 
ing a^{K,'E) when different bandwidths are used. For these reasons we prefer to use a 
bootstrap approximation to calibrate the quantile g„Q . 
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Remark 3.3 (Bootstrap). Let ei=Yi — rh{Xi) be the estimated residual vectors for 
i = 1, . . . ,n and G be a multivariate fc-dimensional random vector such that E{G) = 0, 
Var(G') = Ik and G has bounded fourth-order moments. To facilitate simple construction 

of the test statistic, and faster convergence, we propose the following bootstrap estimate 

of qna- 

Step 1: For i = generate e* = iiGi where Gi,...,G„ are independent and 

identical copies of G, and let Y* — m{Xi,6,g) +e*. Re-estimate 9 and g based on 
{{Xi,Y*)}f^i and denote them as 9* and g* . 

Step 2: Compute the EL ratio at rh{x,9* ,g*) based on {{Xi,Y*)}^^^, denote it as 
£*{rh{x,9* ,g*)} and then obtain the bootstrap version of the test statistic A*(/i) = 
Ji*{rh{x,9*,g*)}TT{x)dx and let C = h'"^/'^{A*„{h) ~ k}. 

Step 3: Repeat steps 1 and 2 N times, and obtain < ■ ■ • < without loss of gen- 
erality. 

The bootstrap estimate of qna is then qna =■ C[Na]+i- 

The proposed EL test with a-level of significance rejects Ho if h^'^^^{An{h) — k} > qna- 

Remark 3.4 (Bandwidth selection). Each bandwidth hi used in the kernel regression 
estimator mi{x) can be chosen by a standard bandwidth selection procedure; for instance, 
the cross-validation (CV) method. The range in terms of order of magnitude for all the 
k bandwidths {hi}^^^ covers the order of n~^^'^'^'^^^\ which is the optimal order that 
minimizes the mean integrated squared error in the estimation of mi and is also the 
asymptotic order of the bandwidth selected by the CV method. We also note that once 
{f^i}f=i chosen, the same set of bandwidths will be used in formulating the bootstrap 
version of the test statistic A* (/i). 

Theorem 3.2. Under assumptions (A.1)-(A.6) and (B.1)-(B.5) given in Section 6, and 
under Hq , 

Pih-''/^An{h)~k}>qna)^a, 

as min(ri, N) — )■ oo. 

Theorem 3.2 maintains that the proposed test has asymptotically correct size. 
We next consider the power of the test under a sequence of local alternatives. First, 
consider the following local alternative hypothesis: 

Hin ■■ mi-) = (^q,9q) + c„r„(-), (3.1) 
where c„ = rt^^/^/i^'*/* and r„(x) = (r„i(a;), . . . ,Tnkix))'^ for some bounded functions 

r„,(-) (/ = i,...,fc). 

We need the following theorem on the asymptotic distribution of the EL test statistic 
under Hin in order to evaluate the property of the EL test. 
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Theorem 3.3. Under the assumptions (A.1)-(A.7) and (B.1)-(B.5) given in Section 6, 
and under Hin, 

as n-^ oo, where 




= j T'^{x)V-^{x)T{x)f{x)T:{x) dx 
and T{x) = lini„_j.oo r„(a:;), assuming such a limit does exists. 

Remark 3.5 (Power). The asymptotic mean of the EL test statistic is given by 
^T'^[x)V'^^{x)T{x)P{x)tt{x)Ax, which is bounded away from zero since V{x) is pos- 
itive definite with the smahest cigenfunction uniformly bounded away from zero. As a 
result, the EL test has a non-trivial asymptotic power, 

<f>[{/3(/,i^,s],r)-zj/a(x,i])], 

where $ is the distribution function of the standard normal distribution. We note here 
that the above power is attained for any T{x) without requiring specific directions in 
which Hin deviates from i/o • This indicates the proposed test is able to test consistently 
any departure from Hq. In summary we have the following theorem. 

Remark 3.6 (Choice of tt). The choice of the weight function tt will affect the per- 
formance of the test. This is reflected in the power function given in Remark 3.5 as both 
the /3 and the a function depend on tt. One possible way to select tt is to maximize the 
following expression with respect to tt: 

{/3(/,AM],r)-z4MAM]), 

which is the argument of the asymptotic power function of the test. Here both 
/?(/, A', E, F) and cr(A', S) depend on the choice of tt. The power also depends on the 
local alternative function r„ as well as on / and the covariance S. While / and E can 
be estimated empirically, the optimization of the selection of tt will have to be done by 
assuming some special form of r„ . 

Theorem 3.4. Under assumptions (A.1)~(A.6) and (B.1)-(B.5) given in Section 6, and 
under Hin, 

P(/i-'^/2{A„(/i) - fc} > g„„) ^ $[{/?(/, K, E, T) - z^}/a{K, E)], 
as min(n, A^) — )- oo . 
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In this section we will apply the general results obtained in Section 3 on a number of 
particular models: partially linear models, single index models, additive models, mono- 
tone regression models, the selection of variables and the simultaneous testing of the 
conditional mean and variance. These six examples form a representative subset of the 
more complete list of examples listed in the introduction section. For the other examples 
not treated here, the development is quite similar. 

4.1. Partially linear models 

Consider the model 

= m{Xi,9o,go) + (-i 

(4.1) 

— ^^00 + ^oi-'^ii H + Qa.,d-\Xi^d-\ + gii{Xid) + Ci, 

where Yi is a one-dimensional response variable (fc = 1), c? > 1, E(ti\Xi = x) = ^ 
and Var(e.i|Xi = x) = S(a:) (1 < i < n). For identifiability reasons we assume that 
E{gQ{Xid)) = 0. This testing problem has been studied in Yatchew (1992), Whang and 
Andrews (1993) and Rodriguez-Poo, Sperlich and Vieu (2009), among others. For any 
0eR'' imdxe R, let 

n 

h{x, e)=Y, {x, b) [Y, -9o- 0lXa 
1=1 

1 " 

g{x,0)^h{x,9)--y^h{X,d,0), 
i=i 

where 

W I L{{x-X,d)/b) 
& is a univariate bandwidth sequence and i is a kernel function. Next, define 

n 

e = argmin V[y, - - ^l^^l ^d-i^^.d-i - g{X,d,0)]\ 

eeR" 

Then, e-Oa = Op{n-'^/^), see Hardle, Liang and Gao ((2000), Chapter 2), and 

\miX,,0o,g)^m{X,,eo,go)\ - \g{X,,eo) - goiX,)\ 

= Op{(7i6)-i/2 log(n)} = Op{(n/i'^)-i/2 iog(„)}^ 



Od-iX^^d-i], (4.2) 

(4.3) 
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uniformly in 1 <i <n, provided h''' /b — >■ 0. This is the case when h ^ and 
6~n~^/^. Hence, condition (B.l) is satisfied. Conditions (B.2) and (B.3) obviously hold, 
since 

= (f,X,i,...,X,,_0 and ggggT =0 

for any g. When the order of the kernel L equals 2, E{g{x, 9q)} — go{x) + 0(6^) uniformly 
in X and 0(6^) is o{h^) provided b/h—>- 0, which is satisfied for the above choices of h 
and b. Hence, (B.4) is satisfied for r = 2. Finally, for condition (B.5), if we choose G equal 
to the class of continuously differentiable and bounded functions, then it is easily seen 
that P{g e t/) — > 1 as n — >cx). 



4.2. Single index models 

In single index models it is assumed that 

Yi = m{Xi, 00, go) + e, = goiO"^ Xi) + e,;. 



(4.4) 



where k (the dimension of Yi) equals 1, 9o = {9oi, ■ ■ ■ ,0od)'^ , Xi — [Xn, . . . ,Xid)^ for 
some d > 1, E{ei\Xi = x) = and Var(ei|Xi = x) = S(x) (1 < i < n). In order to identify 
the model, set H^olj = 1. See, e.g., Xia, Li, Tong and Zhang (2004), Stute and Zhu (2005) 
and Rodn'guez-Poo, Spcrlich and Vieu (2009) for procedures to test this single index 
model. For any €<d and u R, let 



Y. 



^ Lt,{u-e^X,) 
g{u, o)=y = — 

Then, the estimator of 9o is defined by 

n 

§ = aigmm'S2[Y, - g{0'^ X^,e)]^ . 

Hardle, Hall and Ichimura (1993) showed that 9~ 9o = Op{n~^/^). Obviously, from stan- 
dard kernel regression theory we know that 

max\m{Xi,eo,g) -m,{Xi,9o,gQ)\ < s\ip\g{u,9o) - go{u)\ 

= Op{{nb)-'/' log(n)} = 0p{(n/i^)-i/2 log(n)}, 
<Csup\g'iu,9o)~g'o{u)\ 

U 

= Op{(n63)-V2iog(n)}=Op(l), 



d d 
—m{Xi,9o,g) - ■^m{Xi,9o,go) 



92 



8989'^ 



m{X„9o,g) 



<Csup\g"{u,9o) 
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= Csup \g'^{u)\ + Op{(n65)-i/2 j^g^^^^ ^ ^^(^i/2) 

u 

and sup„ \E{g{u,9o)} — goiu)\ — O(fe^) ~ o(/i^), for some C > 0, provided h'^/b^ and 
n6^1og~^(n) — > oo, which is the case (for the partially linear model) when, for example, 
/i^ri-i/('i+4) and b^n-^/^. 

4.3. Additive models 

We suppose now that the model is given by 

Yi = moo + mioiX-a) H h m.do{Xid) + (4.5) 

where fc = 1, d > 1, £'(e,|X, = a;) = 0, \ai{e^\X^ = x) = I](a;) and £'(mjo(^*j)) = (1 < 
i <n; 1 < j < d). The estimation of the parameter moo a-nd of the functions mjo(') (1 < 
J < d) has been considered in Linton and Nielsen (1995) (marginal integration); Opsomer 
and Ruppert (1997) (backfitting) ; and Mammen, Linton and Nielsen (1999) (smooth 
backfitting) . Using the covering technique to extend pointwise convergence results to 
uniform results (see, e.g., Bosq (1998)), it can be shown that the estimators rhj{-) [j = 
1, . . . ,0?) considered in these papers satisfy the following properties: 

sup|mj (.T) - m_,o(a;)| = Op{(n6)"^/^ log(n)}, 

X 

svi^\E{mj{x)} - mjo{x)\ = 0(6^), 

X 

where b is the bandwidth used for either of these estimators. Hence, assumptions (B.l)- 
(B.5) hold true provided h'^ /b — )• and b/h — )■ 0, which is the case when, for example, h 
and b equal the optimal bandwidths for kernel estimation in dimension d, respectively 1, 
namely h ^ n~^/^'^~^^'^ and b ^ n~^l^ (take r = s = 2). 

4.4. Monotone regression 

Consider now the following model 

K,=mo(X,)+e„ (4.6) 

where Xi and Yi are one-dimensional and we assume that mo is monotone. An overview of 
nonparametric methods for estimating a monotone regression function, as well as testing 
for monotonicity, is given in Gijbels (2005). See also Dette, Neumeyer and Pilz (2006) 
for a recent contribution in this area. In the latter paper, a preliminary (not necessarily 
monotone) estimator of the regression function is used to obtain a monotone estimator 
of the inverse of the regression function, which is then inverted to obtain a monotone 
estimator of the regression function itself. 
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Let rhiyx) be an arbitrary estimator of mo(x) under the assumption of monotonicity 
that is based on a bandwidth sequence h and a kernel L of order s and that satisfies 

sup |m(x) - mo(a;)| = Op{(nfe)"^/^ log(ri)}, 

X 

sup \E{m{x)} - mQ{x)\ = 0(6") 

(as for the additive model, the uniformity in x can be obtained by using classical tools 
based on the covering technique). For instance, the estimator given in Dette, Neumeyer 
and Pilz (2006) satisfies this property (see their Theorem 3.2). Let Q be the class of 
monotone functions defined on the support of X . Then the regularity conditions (B.l)- 
(B.5) on rh{x) are satisfied provided h/h^ and h'^ /h^ — > 0; for example, when s = 3, 
r = 2,h = Kn~^/^ and h = &log~^(n). Note that conditions (B.2)-(B.3) are automatically 
satisfied, since there is no parametric component in the model. Contrary to the previous 
examples, here we cannot take h and h equal to the optimal bandwidths for kernel 
estimation, as they both involve univariate smoothing. It now follows from Theorem 3.1 

that /i~^/^(A„(/i) — 1) -4 A^(0,cr^) for some ct^ > 0, which by Remark 3.1 only depends 
on K and tt. 



4.5. Selection of variables 



In this example we apply the general testing procedure on the problem of selecting 
explanatory variables in regression. Let X, = {Xl'>\ XY' ' )T be a vector of d = di + da 
{d\,di > 1) explanatory variables. We like to test whether the vector X> should or 
should not be included in the model. See Dclgado and Gonzalez Manteiga (2001) for 
other nonparametric approaches to this problem. Our null model is 

K,=mo(xf )) + £,. (4.7) 

Hence, under the hypothesized model, the regression function m{x^^\x^'^')) is equal to a 
function mo(a;'^^) only. In our testing procedure we estimate mo{-) by 

where L is a di-dimensional kernel function of order s = 2 and b a bandwidth sequence. It 
is easily seen that this estimator satisfies the regularity conditions provided h'^/b'^^ — >■ 
and b/h^Q (take r = 2). As before, the optimal bandwidths for estimation, namely 

ft, ^ 77"-'^/^'^+'*^ and b ^ n~^/'^'^^~^^'> satisfy these constraints. 



4.6. Simultaneous testing of the conditional mean and variance 

Let Zi = r{X,) + i:^/^{X,)ei where Z i is a /ji -dimensional response variable of a d- 
dimensional covariate Xi, and r{x) = E{Zi\Xi = x) and S(a;) = \&T{Zi\Xi = x) are re- 
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spectively the conditional mean and variance functions. This is a standard muhivariate 
nonparametric regression modeL Suppose that r(x,6,g) and 'E{x,0,g) are certain work- 
ing models for the conditional mean and variance, respectively. Hence, the hypothesized 
regression model is 

= r (X„ 0, .g) + (X„ 0, g)e, , (4.8) 

where the standardized residuals {e^}"^]^ satisfy E{ei\Xi) = and Var(ei|J¥'i) = Id- Here, 
Id is the d-dimensional identity matrix. Clearly, the parametric (without g) or semi- 
parametric (with g) model specification of (4.8) consists of two components of specifica- 
tions: one for the regression part r{Xi, 9, g) and the other is the conditional variance part 
T,{Xi,9,g). The model (4.8) is valid if and only if both components of the specifications 
are valid simultaneously. Hence, we need to test the goodness of fit of both r{x, 0, g) and 
Ti(x,9,g) simultaneously. 

To use the notation of this paper, we have 

mix.e.g) = {r{x,e,g),vcc{T.{x,e,g) + r{x,e,g)r^{x,e,g)}f 

and the multivariate "response" Yi = {Zi,\ec{ZiZj))'^ . Here vec(A) denotes the operator 
that stacks columns of a matrix A into a vector. 

5. Simulations 

We carry out two simulation studies. For the first one, consider the following model: 

Y, = l+Q.bXn+ag^{Xa)+92{X^2)+e^ (5.1) 

(i = l,...,n). Here, the covariates Xn and Xi2 are independent and follow a uniform 
distribution on [0,1], and the error Si is independent of Xi = {Xii,Xi2) and follows a 
normal distribution with mean zero and variance given by Yai{ei\Xi) — (1.5 + Xn + 
-^^42)^/100. Several choices are considered for the constant a > and the functions gi 
and g2- We are interested in testing whether the data follow a partially linear model, in 
the sense that the regression function is linear in Xn , and (possibly) nonlinear in Xi2 . 

We will compare our EL-based test with the test considered by Rodrfguez-Poo, Sper- 
lich and Vieu (2009) (RSV hereafter), which is based on the Loo-distance between a 
completely nonparametric kernel estimator of the regression function and an estimator 
obtained under the assumption that the model is partially linear. 

The simulations are carried out for samples of size 100 and 200. The significance level is 
a — 0.05. A total of 300 samples are selected at random, and for each sample 300 random 
resamples are drawn. We choose to work with the weight function ti{x) = /(O.l < x < 
0.9), in order to avoid boundary effects for small and large values of x. A triangular 
kernel function K{u) = (1 — |u|)/(|u| < 1) is used and we determine the bandwidth b 
by using a cross-validation procedure. For the bandwidth h, we follow the procedure 
used by Rodrfguez-Poo, Sperlich and Vieu (2009), that is, we consider the test statistic 
sup^^<^<,jJ/i"'^/^{A„(/i) — fc}], where /iq and hi are chosen in such a way that the 
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bandwidth obtained by cross-validation is included in the interval. For n = 100 we take 
/lo = 0.22 and hi = 0.28 and for n = 200 we select ho = 0.18 and hi = 0.24. The critical 
values for this test statistic are obtained from the distribution of the bootstrap statistic, 
given by sup,,„<;,<^^ [/i-''/2{A;(K) - k}]. 

The results are shown in Table 1. The table shows that the level is well respected for 
both sample sizes, and for both choices of the function (72- Under the alternative hy- 
pothesis, all the considered models demonstrate that the power increases with increasing 
sample size and increasing value of a. The empirical likelihood test is in general more 
powerful than the RSV test when a is small (a = 0.5 and 1.0). For the largest a con- 
sidered, that is, a = 1.5, the RSV test is slightly more powerful. However, this happens 
when both tests enjoy a large amount of power. 

We now consider a second simulation study, based on the following model: 

Yi^l + O.bXa + aXf^ + exp{Xi2) 4- 0.15cxp(cX,i)ei (5.2) 

(i = 1, . . . , n). As before, the covariates Xij (j = 1, 2) are independent and follow a uni- 
form distribution on [0, 1] and the error is independent of Xi = {Xii,Xi2) and follows 
a standard normal distribution. We arc interested in simultaneous testing of the regres- 
sion and variance function for several choices of a and c. The null model corresponds to 
a = c = 0, that is, under the null hypothesis we have a homoscedastic partial linear model. 
The same choices for n, a, K, b and h are taken as in the first simulation study. As before, 
we carry out 300 simulations and, for each sample, 300 random rcsamples arc generated. 
The weight function is now given by 7r(a;) = tt{xi , X2) = 0^=1 ^(0-1 ^ ^ 0.9). 

The results are shown in Table 2. As far as we know, there is no competitor for this 
test in the literature. As is clear from the table, the rejection probabilities are close to 
the nominal level under the null hypothesis and increase when a, c and n get larger. 



Table 1. Rejection probabilities under the null hypothesis (a = 0) and under the alternative 
hypothesis (a > 0). The test of Rodn'guez-Poo, Sperlich and Vieu (2009) is indicated by 'RSV, 
the new test is indicated by 'EL' for model (5.1) 





a 






- -r^ 

— a^i 






3i(a;i) = 21og(2;i + 0.5) 




n = 


100 


n — 


200 


n 


= 100 


n = 200 


RSV 


EL 


RSV 


EL 


RSV 


EL 


RSV 


EL 


exp(a;2) 





0.047 


0.053 


0.040 


0.043 


0.047 


0.053 


0.040 


0.043 




0.5 


0.123 


0.153 


0.160 


0.193 


0.123 


0.147 


0.127 


0.160 




1 


0.377 


0.420 


0.653 


0.683 


0.387 


0.400 


0.657 


0.660 




1.5 


0.787 


0.743 


0.973 


0.980 


0.747 


0.723 


0.973 


0.960 


2 

3:2+1 





0.033 


0.037 


0.043 


0.053 


0.033 


0.037 


0.043 


0.053 




0.5 


0.110 


0.120 


0.153 


0.177 


0.107 


0.133 


0.113 


0.147 




1 


0.373 


0.397 


0.667 


0.657 


0.407 


0.440 


0.660 


0.713 




1.5 


0.753 


0.733 


0.977 


0.963 


0.797 


0.763 


0.990 


0.983 
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6. Assumptions and proofs 

Assumptions. 

(A.l) K is a d-dimensional product kernel of the form K{ti, . . . ,td) = Y\'j=ik{tj) , 
where k is an rth-order (r >2) univariate kernel (i.e., k{t) > and J k[t) dt = 1) sup- 
ported on [—1,1] and k is symmetric, bounded and Lipschitz continuous. 

(A. 2) The baseline smoothing bandwidth h satisfies nh'^'^'^^ — > K for some K > 
0, nh'^'^/^log~^{n) —¥ oo, and hi/h — >■ /3/ as n -> oo, where cq < mini<i<fc{/3j} < 
inaxi<i<fc{/3/} < ci for finite and positive constants cq and ci. Moreover, d < Ar and 
the weight function tt is bounded, Lipschitz continuous on its compact support S and 
satisfies J Tr{x) da; = 1. 

(A.3) Let ei = Yi-m{Xi,do,go) = {eii,...,eik)^. E{\Y[^j^ieii.\\Xi = x) is uniformly 
bounded for all h, . . . ,le G {1, . . . ,k} and all x G S. 

(A. 4) f{x) and all the cf^ix) 's have continuous derivatives up to the second order in 
S, mix£S f{x) > and mini iid ^ <^ii{x) > 0. Let ^i(x) and ^fe(x) be the smallest and 
largest eigenvalues of V{x). We assume that C2 < mfx^s ^lix) < sup^.^^ ^^(a;) < C3 for 
finite and positive constants 02 and C3 . 

(A. 5) Q is a subspace of RP , P{9 G 6) — > 1 as n —> 00, and 9 satisfies 9 — 9q ^ 

(A. 6) m(x, 9, g) is twice continuously partially differentiate with respect to the com- 
ponents of 9 and x for all g , and the derivatives are bounded uniformly in x G S,9 & Q 
and g gG. 

(A. 7) The functions T„i{x) (I = 1, . . . ,k) appearing in the local alternative hypothesis 
converge to ^i{x) as n 00, and Ti{x) is uniformly bounded with respect to x. 

Let Ai{x,9)^mi{x,9,g)-mi{x,9,go) for I ^ I, . . . ,k, A{x,9) = (Ai(a;, 0))ti, 
gf ^ {x, 9) = [Kh, [x - X,)Ai [x, 9),..., Kh, {x - X,)Kk{x, 9)f, 

and let \\ ■ \\ be the Euclidean norm. 

The following conditions specify stochastic orders for some quantities involving 
q\^\x,9o). They can be verified for particular choices of the null model. In Section 4, we 



Table 2. Rejection probabilities under the null 
hypothesis {a = c — 0) and under the alternative 
hypothesis (a, c> 0) for model (5.2) 



a 


c 


n = 100 


n = 200 








0.053 


0.043 


0.5 


1 


0.080 


0.130 


1 


2 


0.203 


0.310 


2 


4 


0.550 


0.757 


3 


6 


0.853 


0.913 



Empirical likelihood goodness- of- fit test in multiresponse regression 



971 



show that these conditions hold true for the common semi-parametric regression models, 
like the single index and partial linear model. 

(B.l) max,,/ |mi(X,, 6*0,5) - mi{X,,eQ,gQ)\^ Op{{nh'^)~^/'^\og{n)} . 

(B.2) max,,/|^I2i(|^-^irii%^|=0p(l). 

(B.3) max,/|| "^-^(^^:^-g) ||=Op(nV2). 

(B.4) &\vp^^s\\E{m{x,eQ,g)} ~ m{x,eo,go)\\ =o(/i''). 

(B.5) P{g e a) ^ 1 asn^oo. 

Because of space restrictions, the proofs of the next five lemmas are omitted. They 
can be found in a technical report, available from the authors (Chen and Van Keilegom 
(2009)). 

Lemma 6.1. Assumption (B.l) implies (B.la)-(B.lc), given by 

(B.la) sup,^s[n~'E7=iQ?\^:&o)Q^i'^'^{x,eo)]^Op{n'^h-^d\og\n)}. 
(B.lb) sup,esmaxi<,<„||Qf^(a;,0o)|| =Op{n-V2/i-3d/2iog(„)|^ 

(B.lc) sup,gs[n-iEr=iQr(^,^o)] -Op{(n/i'^)-i/2log(n)}. 

Next, we consider the uniform rate of convergence of the Lagrange multiplier A(a;). 
Lemma 6.2. Assume (A.1)-(A.6) and (B.1)-(B.5). Then, under Hq, 
svLp\\Xix)h-'^\\ = Op{inh'^)-'^/^\og{n)}. 

The following lemma gives a one-step expansion for X{x). 
Lemma 6.3. Assume (A.1)-(A.6) and (B.1)-(B.5). Then, under Ho, 

\ix)h-'^ = V-\x)Q{x,9) +Op{{nh'^)-Hog^in)}, (6.1) 
uniformly with respect to x ^ S . 

We next derive an expansion of the EL ratio statistic. 
Lemma 6.4. Assume (A.1)-(A.6) and (B.1)-(B.5). Then, under Hq, 

e{m{x, e,g)}^ nh'^Q'^ix, e)V-\x)Q{x, 9) + g„(x, §) + Op{h'^'^), 
uniformly with respect to x € S , where 

qn{x,e) = nh''Q^{x,e){{h''Sn{x,e))-^ - v-\x)}Q{x,e) + lnh''bn{x). 
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Applying Lemma 6.4, it can be shown that the EL test statistic can be written as 

A„{h) = nh'^ j Q^{x, 9o)V-\x)Q{x, eo)TT{x) dx + Rn + Op{h'^/^), (6.2) 

where 

i?„= J qnix,e)TT{x) dx + 2nh'^ J Q'^ {x,9o)V'\x){Qix,e) - Q{x,9o)}n{x) dx 

+ nh'^ j {Q{x, §) ~ Q{x, eo)}^V-\x){Q{x, 0) ~ Q{x, eo)}Tr{x) dx. 
Let us consider the order of _R„ . 
Lemma 6.5. Assume (A.1)-(A.6) and (B.1)-(B.5). Then, under Hq, i?„ = 0p(/i'^/^). 
Lemma 6.6. Under assumptions (A.1)-(A.6) and (B.1)-(B.5), and under Hq, 

A„(/J)-A„i(/J)+Op(/i'^/2), (6.3) 

where Ani{h) = nh'^ J Q^^^^{x, ea)V-\x)Q^^'^ {x, 9o)n{x) dx. 
Proof. Lemma 6.5 and (6.2) lead to 

A„(K) = A„i(K)+2n/i'^y" Q'^^^^{x,eo)V-^{x)Q^'^\x,eo)T:{x) dx 

+ nh'^ J Q(2)^(x, eo)V-\x)Q^^^ (x, 0o)tt{x) dx + o{h'^'^). 
Applying the same analysis to the term Dn3{x) in the proof of Lemma 6.5, we have 
nh'^ f Q^^^^ix, do)V-\x)Q'^^\x, eo)TT{x) dx = Op{{nh'^)-^ log'(n)} = Opih"^/^). 



It remains to check the order of A„2(/i) ^ nh'^ J Q''''-^^{x,6o)V~^{x)Q^^\x,6o)n{x)dx. 
Applying the same style of derivation as for Dni{x), it can be shown that A„2('i) = 
Op{h'^/'^). This finishes the proof. □ 



Proof of Theorem 3.1. First note that 

k 



Ani{h)=n ^h'^^^iii{x)ljt{x) / Kh,{x - X{)Kh,{x - Xj)-fit{x)f ^(a;)7r(x) dx, 



i,3 l,t 



where hi{x) = Yu - rhi{x,0o,go). Let K^^'^ (Pi, Puu) = p^'' J K{z)K{^ +u)dz, which is 
a generalization of the standard convolution of K to accommodate different bandwidths 
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and is symmetric with respect to /3/ and Pt ■ By a change of variable and noticing that K 
is a compact kernel supported on [—1, l]'^, A„i(/i) = A„ii(/i){l + Op{h?)}, where 



i,3 l,t 



2_^2^<^il^3tl\' 'iPhPt, — 7— 
z^j l,t ^ * 



«=1 l.t i^^' 



f{X^)f{X,) 



(6.4) 



It is straightforward to show that A„ii2(/i) = k + Op{h'^/^). Thus, it contributes only 
to the mean of the test statistic. As A„iii(/i) is a degenerate {/-statistic with kernel 
depending on n, straightforward but lengthy calculations lead to 

The establishment of the above asymptotic normality can be achieved by either an ap- 
proach using the martingale central limit theorem (Hall and Heyde (1980)) as demon- 
strated in Hall (1984) or an approach using the generalized quadratic forms (de Jong 
(1987)) as demonstrated in Hardlc and Mammen (1993). Note that (n/i'')~^ log^(n) = 
Q{ji<i/2y Applying Slutsky's theorem leads to the result. □ 

Proof of Theorem 3.2. It can be checked that given the original sample Xn = 
{{Xi^Yi)}^^^, versions of assumptions (B.1)-(B.5) are true for the bootstrap resam- 
ple. Hence Lemmas 6.2-6.6 are valid for the resample given Xn- In particular, let 
Q*{x,9) be the bootstrap version of Q{x,9q), let V{x) = f{x){J3~'^R{Pi/l3j)aij{x))kxk, 
where crij{x) = f~^{x)n~^J2i=i ^h{x - Xi)eiieij, f{x) = n^^Y^i^hix - X.i), and let 
(7o(a;))fcxfc = f{x)V-'^{x). Then, conditional on x™, t {fh{x,e* ,g*)} = nh'^Q*'^ {xj) x 
V-^{x)Q*{x,e)+Op{h'^l'^) and A* (/J) = A*ii(K) -I- Op(/i''/2), where 



n k 



which are respectively the bootstrap versions of (6.3) and (6.4). 

Then apply the central limit theorem for degenerate JJ-statistics as in the proof of 
Theorem 3.1, conditional on Xn, h-'^l'^{Kl^^ - k) A N{0,a^{K,t)), where a^iK,t) is 
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a'^{K,Ti) with T,{x) replaced by = (Jjij(x))kxk- This implies that 



h~'^'^{Kl - k) A N{0,a^{K, E)). (6.5) 

Let Z = N{0,a'^{K,'E)) and Z = N{0,a'^{K,T,)), and Za and Za be the upper-a quan- 
tiles of N{0,(T^{K,I])) and N{0,(t^{K,T,)), respectively. Recall that and qna are, 
respectively, the uppcr-a quantile of /i~''/^(A* — k) given Xn and h-'^/'^{An-k). As (6.5) 
implies that 1 - a = P(/i~''/^(A* - fc) < g„Q|x„) = P(Z < (7„a) + o(l), it follows that 
Qna ~ Za + o{l) Conditionally on Xn- A similar argument by using Theorem 3.1 leads 
to Qna = Za + o(l). As S(x) A S(a;) uniformly in a; S S*, then a'^{K, E) A cr^(i^, E), and 
hence Za = Za + o(l). Therefore, qna = Qna + Op(l) and this completes the proof. □ 

Proof of Theorem 3.3. It can be shown that Lemmas 6.2-6.6 continue to hold true 
when we work under the local alternative Hi,,,. In particular, (6.3) is still valid. By 
using a derivation that resembles very much that for obtaining (6.4), we have A„(ft,) = 
{Aniiih) + A^,ii2(M + A°U3(M}{1 + Op{h^)} + Op{h-^'^), where A„ii(K) is defined in 
(6.4), 

k ^ n 

i,t •' i,] 

X {ejt^niiXi) + eiiTnt{X;j)}dx 

and 

k ^ n 
l.t ij 

x^it{x)r„i{X,)r„t{Xj)7r{x)r\x) dx. 
It can be shown that E{Af^ii2W} — a-nd that 

E{A'},,,sm = in~l)h^clj rl{x)V-\x)rn{x)f{xMx)dx 

+ 4/3r'' fY.^^f^i/M'r"i^''^^'t(''^'^"tixMx)dx{l + Oih^)} 
l,t 

(6-6) 

= h'^/^ISif, K, E, F) + 0{cl + nh'^+^cl) + o{h''/^). 
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It is fairly easy to see that Kii2$.) = Op{h'^^^) and A^i^^ih) = h'^/^/3{f,K,^,r) + 

Op{h'^/'^). From Lemma 6.6, h-'^l'^[kriii{h) - k}] A N{0,a'^{K,Y.)). The theorem now 
follows after combining these results. □ 

Proof of Theorem 3.4. The proof follows directly from Theorem 3.3 and from the fact 
that qna ~ Qna + Op(l), which is established in the proof of Theorem 3.2. □ 
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